57 research outputs found

    SSP: An interval integer linear programming for de novo transcriptome assembly and isoform discovery of RNA-seq reads

    Get PDF
    AbstractRecent advances in the sequencing technologies have provided a handful of RNA-seq datasets for transcriptome analysis. However, reconstruction of full-length isoforms and estimation of the expression level of transcripts with a low cost are challenging tasks. We propose a novel de novo method named SSP that incorporates interval integer linear programming to resolve alternatively spliced isoforms and reconstruct the whole transcriptome from short reads. Experimental results show that SSP is fast and precise in determining different alternatively spliced isoforms along with the estimation of reconstructed transcript abundances. The SSP software package is available at http://www.bioinf.cs.ipm.ir/software/ssp

    Impact of RNA structure on the prediction of donor and acceptor splice sites

    Get PDF
    BACKGROUND: gene identification in genomic DNA sequences by computational methods has become an important task in bioinformatics and computational gene prediction tools are now essential components of every genome sequencing project. Prediction of splice sites is a key step of all gene structural prediction algorithms. RESULTS: we sought the role of mRNA secondary structures and their information contents for five vertebrate and plant splice site datasets. We selected 900-nucleotide sequences centered at each (real or decoy) donor and acceptor sites, and predicted their corresponding RNA structures by Vienna software. Then, based on whether the nucleotide is in a stem or not, the conventional four-letter nucleotide alphabet was translated into an eight-letter alphabet. Zero-, first- and second-order Markov models were selected as the signal detection methods. It is shown that applying the eight-letter alphabet compared to the four-letter alphabet considerably increases the accuracy of both donor and acceptor site predictions in case of higher order Markov models. CONCLUSION: Our results imply that RNA structure contains important data and future gene prediction programs can take advantage of such information

    Impact of residue accessible surface area on the prediction of protein secondary structures

    Get PDF
    <p>Abstract</p> <p>Background</p> <p>The problem of accurate prediction of protein secondary structure continues to be one of the challenging problems in Bioinformatics. It has been previously suggested that amino acid relative solvent accessibility (RSA) might be an effective factor for increasing the accuracy of protein secondary structure prediction. Previous studies have either used a single constant threshold to classify residues into discrete classes (buries vs. exposed), or used the real-value predicted RSAs in their prediction method.</p> <p>Results</p> <p>We studied the effect of applying different RSA threshold types (namely, fixed thresholds vs. residue-dependent thresholds) on a variety of secondary structure prediction methods. With the consideration of DSSP-assigned RSA values we realized that improvement in the accuracy of prediction strictly depends on the selected threshold(s). Furthermore, we showed that choosing a single threshold for all amino acids is not the best possible parameter. We therefore used residue-dependent thresholds and most of residues showed improvement in prediction. Next, we tried to consider predicted RSA values, since in the real-world problem, protein sequence is the only available information. We first predicted the RSA classes by RVP-net program and then used these data in our method. Using this approach, improvement in prediction was also obtained.</p> <p>Conclusion</p> <p>The success of applying the RSA information on different secondary structure prediction methods suggest that prediction accuracy can be improved independent of prediction approaches. Thus, solvent accessibility can be considered as a rich source of information to help the improvement of these methods.</p

    A pairwise residue contact area-based mean force potential for discrimination of native protein structure

    Get PDF
    Abstract Background Considering energy function to detect a correct protein fold from incorrect ones is very important for protein structure prediction and protein folding. Knowledge-based mean force potentials are certainly the most popular type of interaction function for protein threading. They are derived from statistical analyses of interacting groups in experimentally determined protein structures. These potentials are developed at the atom or the amino acid level. Based on orientation dependent contact area, a new type of knowledge-based mean force potential has been developed. Results We developed a new approach to calculate a knowledge-based potential of mean-force, using pairwise residue contact area. To test the performance of our approach, we performed it on several decoy sets to measure its ability to discriminate native structure from decoys. This potential has been able to distinguish native structures from the decoys in the most cases. Further, the calculated Z-scores were quite high for all protein datasets. Conclusions This knowledge-based potential of mean force can be used in protein structure prediction, fold recognition, comparative modelling and molecular recognition. The program is available at http://www.bioinf.cs.ipm.ac.ir/softwares/surfield</p

    Parrondo's Paradox for Games with Three Players

    Get PDF
    Parrondo’s paradox appears in game theory which asserts that playing two losing games, A and B (say) randomly or periodically may result in a winning expectation. In the original paradox the strategy of game B was capital-dependent. Some extended versions of the original Parrondo’s game as history dependent game, cooperative Parrondo’s game and others have been introduced. In all of these methods, games are played by two players. In this paper, we introduce a generalized version of this paradox by considering three players. In our extension, two games are played among three players by throwing a three-sided dice. Each player will be in one of three places in the game. We set up the conditions for parameters under which player one is in the third place in two games A and B. Then paradoxical property is obtained by combining these two games periodically and chaotically and (s)he will be in the first place when (s)he plays the games in one of the mentioned fashions. Mathematical analysis of the generalized strategy is presented and the results are also justified by computer simulations

    A tale of two symmetrical tails: Structural and functional characteristics of palindromes in proteins

    Get PDF
    <p>Abstract</p> <p>Background</p> <p>It has been previously shown that palindromic sequences are frequently observed in proteins. However, our knowledge about their evolutionary origin and their possible importance is incomplete.</p> <p>Results</p> <p>In this work, we tried to revisit this relatively neglected phenomenon. Several questions are addressed in this work. (1) It is known that there is a large chance of finding a palindrome in low complexity sequences (i.e. sequences with extreme amino acid usage bias). What is the role of sequence complexity in the evolution of palindromic sequences in proteins? (2) Do palindromes coincide with conserved protein sequences? If yes, what are the functions of these conserved segments? (3) In case of conserved palindromes, is it always the case that the whole conserved pattern is also symmetrical? (4) Do palindromic protein sequences form regular secondary structures? (5) Does sequence similarity of the two "sides" of a palindrome imply structural similarity? For the first question, we showed that the complexity of palindromic peptides is significantly lower than randomly generated palindromes. Therefore, one can say that palindromes occur frequently in low complexity protein segments, without necessarily having a defined function or forming a special structure. Nevertheless, this does not rule out the possibility of finding palindromes which play some roles in protein structure and function. In fact, we found several palindromes that overlap with conserved protein Blocks of different functions. However, in many cases we failed to find any symmetry in the conserved regions of corresponding Blocks. Furthermore, to answer the last two questions, the structural characteristics of palindromes were studied. It is shown that palindromes may have a great propensity to form α-helical structures. Finally, we demonstrated that the two sides of a palindrome generally do not show significant structural similarities.</p> <p>Conclusion</p> <p>We suggest that the puzzling abundance of palindromic sequences in proteins is mainly due to their frequent concurrence with low-complexity protein regions, rather than a global role in the protein function. In addition, palindromic sequences show a relatively high tendency to form helices, which might play an important role in the evolution of proteins that contain palindromes. Moreover, reverse similarity in peptides does not necessarily imply significant structural similarity. This observation rules out the importance of palindromes for forming symmetrical structures. Although palindromes frequently overlap with conserved Blocks, we suggest that palindromes overlap with Blocks only by coincidence, rather than being involved with a certain structural fold or protein domain.</p

    Global haplotype partitioning for maximal associated SNP pairs

    Get PDF
    <p>Abstract</p> <p>Background</p> <p>Global partitioning based on pairwise associations of SNPs has not previously been used to define haplotype blocks within genomes. Here, we define an association index based on LD between SNP pairs. We use the Fisher's exact test to assess the statistical significance of the LD estimator. By this test, each SNP pair is characterized as associated, independent, or not-statistically-significant. We set limits on the maximum acceptable proportion of independent pairs within all blocks and search for the partitioning with maximal proportion of associated SNP pairs. Essentially, this model is reduced to a constrained optimization problem, the solution of which is obtained by iterating a dynamic programming algorithm.</p> <p>Results</p> <p>In comparison with other methods, our algorithm reports blocks of larger average size. Nevertheless, the haplotype diversity within the blocks is captured by a small number of tagSNPs. Resampling HapMap haplotypes under a block-based model of recombination showed that our algorithm is robust in reproducing the same partitioning for recombinant samples. Our algorithm performed better than previously reported models in a case-control association study aimed at mapping a single locus trait, based on simulation results that were evaluated by a block-based statistical test. Compared to methods of haplotype block partitioning, we performed best on detection of recombination hotspots.</p> <p>Conclusion</p> <p>Our proposed method divides chromosomes into the regions within which allelic associations of SNP pairs are maximized. This approach presents a native design for dimension reduction in genome-wide association studies. Our results show that the pairwise allelic association of SNPs can describe various features of genomic variation, in particular recombination hotspots.</p

    A Behavioural Bayes Method for Determining the Size of a Clinical Trial

    No full text
    In this paper we introduce a fully Bayesian approach to sample size determination in clinical trials. In contrast to the usual Bayesian decision theoretic methodology, which assumes a single decision maker, our approach recognises the existence of three decision makers, namely: the pharmaceutical company conducting the trial, which decides on its size; the regulator, whose approval is necessary for the drug to be licenced for sale; and the public at large, who determine ultimate usage. Moreover, we model the subsequent usage by plausible assumptions for actual behaviour, rather than assuming that it represents decisions which are in some sense optimal. The results, not surprisingly, show that the optimal sample size depends strongly on the expected benefit from a conclusively favourable outcome, and on the strength of the evidence required by the regulator

    How large should a clinical trial be?

    No full text
    One of the most important questions in the planning of medical experiments to assess the performance of new drugs or treatments, is how big to make the trial. The problem, in its statistical formulation, is to determine the optimal size of a trial. The most frequently used methods of determining sample size in clinical trials is based on the required p-value, and the required power of the trial for a specified treatment effect. In contrast to the Bayesian decision theoretic approach there is no explicit balancing of the cost of a possible increase in the size of the trial against the benefit of the more accurate information which it would give. In this work we consider a fully Bayesian (or decision theoretic) approach to sample size determination in which the number of subsequent users of the therapy under investigation, and hence also the total benefit resulting from the trial, depend on the strength of the evidence provided by the trial. Our procedure differs from the usual Bayesian decision theory methodology, which assumes a single decision maker, by recognizing the existence of three decision makers, namely: the pharmaceutical company conducting the trial, which decides on its size; the regulator, whose approval is necessary for the drug to be licenced for sale; and the public at large, who determine the ultimate usage. Moreover, we model the subsequent usage by plausible assumptions for actual behaviour, rather than assuming that this represents decisions which are in some sense optimal. For this reason the procedure may be called "Behavioural Bayes" (or BeBay for short), the word Bayes referring to the optimization of the sample size. In the BeBay methodology the total expected benefit from carrying out the trial minus the cost of the trial is maximized. For any additional sales to occur as a result of the trial it must provide sufficient evidence both to convince the regulator to issue the necessary licence and to convince potential users that they should use the new treatment. The necessary evidence is in the form of a high probability after the trial that the new treatment achieves a clinically relevant improvement compared to the alternative treatment. The regulator is assumed to start from a more sceptical and less well-informed view of the likely performance of the treatment than the company carrying out the trial. The total benefit from a conclusively favourable trial is assessed on the basis of the size of the potential market and aggregated over the anticipated life-time of the product, using appropriate discounting for future years
    corecore